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Abstract 

Tracking Twitter for public health has shown great po- 
tential. However, most recent work has been focused 
on correlating Twitter messages to influenza rates, a 
disease that exhibits a marked seasonal pattern. In the 
presence of sudden outbreaks, how can social media 
streams be used to strengthen surveillance capacity? In 
May 2011, Germany reported an outbreak of Enterohe- 
morrhagic Escherichia coli (EHEC). It was one of the 
largest described outbreaks of EHEC/HUS worldwide 
and the largest in Germany. In this work, we study the 
crowd's behavior in Twitter during the outbreak. In par- 
ticular, we report how tracking Twitter helped to detect 
key user messages that triggered signal detection alarms 
before MedlSys and other well established early warn- 
ing systems. We also introduce a personalized learning 
to rank approach that exploits the relationships discov- 
ered by: (i) latent semantic topics computed using La- 
tent Dirichlet Allocation (LDA), and (ii) observing the 
social tagging behavior in Twitter, to rank tweets for 
epidemic intelligence. Our results provide the grounds 
for new public health research based on social media. 



1 Epidemic Intelligence Based on Twitter 

In May 2011, an outbreak of enterohaemorrhagic Es- 
cherichia coli (EHEC) occurred in northern Germany. It was 
one of the largest described outbreaks of EHEC/HUS world- 
wide and the largest in Germany (Frank et al. 201 1). 

Day 1: May 19, 2011, the Robert Koch Institute (RKI), 
Germany's Federal Public Health Authority, was invited by 
the Health and Consumer Protection Agency in Hamburg to 
assist in the investigation of three cases of Hemolytic-uremic 
syndrome (HUS), a life-threatening illness caused by EHEC. 
Day 2: May 20, alarmed by the type of persons affected and 
the rapid spread of EHEC, an investigation was initiated by 
RKI, involving all levels of public-health and food-safety 
authorities to identify the cause of the outbreak, and to pre- 
vent further cases of disease. On day 5: May 23, RKI asked 
all health departments to expedite procedures, by immedi- 
ately forwarding all case reports of suspected or confirmed 
EHEC/HUS, to the Federal Public Health Authority, relying 



*A short version of this work has been accepted for publication at 
the International AAAI Conference on Weblogs and Social Media 
(ICWSM 2012). 



directly on the diagnoses of notifying clinicians (Frank et al. 
2011; RKI). 

Based on this five-day timeline of EHEC/HUS 2011 out- 
break in Germany, one can see that public health officials 
are faced with new challenges for outbreak alert and re- 
sponse. This is due to the continuous emergence of infec- 
tious diseases and their contributing factors such as demo- 
graphic change, or globalization. Early reaction is necessary, 
but often communication and information flow through tra- 
ditional channels is slow. Can additional sources of infor- 
mation, such as social media streams, provide complements 
to the traditional epidemic intelligence mechanisms? 

Epidemic Intelligence (EI) encompasses activities related 
to early warning functions, signal assessments and outbreak 
investigation. Only the early detection of disease activity, 
followed by a rapid response, can reduce the impact of 
epidemics. Recently, modern disease surveillance systems 
have started to also monitor social media streams, with the 
objective of improving their timeliness to detect disease 
outbreaks, and producing warnings against potential pub- 
lic health threats (e.g., (Corley et al. 2010)). The real-time 
nature of Twitter makes it even more attractive for public 
health surveillance. 

Recent works have shown the potential of using Twitter 
for public health. These works have either focused on: the 
text classification and filtering of tweets (Sofean et al. 2012; 
Sriram et al. 2010); or finding predictors for diseases that 
exhibit a seasonal pattern (i.e., influenza-like illnesses) by 
correlating selected keywords with official influenza statis- 
tics and rates (Culotta 2010; Lampos and Cristianini 2010; 
Signorini, Segre, and Polgreen 2011). Still others have fo- 
cused on mining Twitter content for topic (Paul and Dredze 
2011a; 2011b) or sentiment analysis (Chew and Eysenbach 
2009). Furthermore, these existing approaches have all fo- 
cused on countries where the tweet density is known to be 
high (e.g., the UK, or U.S.). 

In this paper, we seek to address the issues that can help 
deliver a public health surveillance system based on Twitter, 
by taking into account two important stages in epidemic in- 
telligence: Early Outbreak Detection and Outbreak Analysis 
and Control, and take up the following questions: 

1. Early Outbreak Detection: Is it possible, by only using 
Twitter, to find early cases of an outbreak, before well es- 
tablished systems? 



2. Outbreak Analysis and Control: Is it possible to use Twit- 
ter to understand the potential causes of contamination 
and spread? and How can we provide support for pub- 
lic health official to analyze and assess the risk based on 
the available social media information? 

In contrast, to the aforementioned studies, ours focuses 
on a sudden outbreak of a disease that does not involve any 
seasonal pattern. Moreover, our work shows the potential of 
Twitter in countries where the tweet density is significantly 
lower, such as Germany. The contributions of this paper are 
summarized as follows: 

• We provide an example of the application of standard 
surveillance algorithms on Twitter data collected in real- 
time during a major outbreak of EHEC/HUS in Germany, 
and provide insights showing the potential of Twitter for 
early warning. 

• For outbreak analysis and control, many studies have been 
made for systems that return documents in response to a 
query, little effort has been devoted to exploiting learning 
to rank in a personalized setting, specially in the domain 
of epidemic intelligence. This paper presents an innova- 
tive personalized ranking approach that offers decision 
makers the most relevant and attractive tweets for risk 
assessment, by exploiting latent topics and social hash- 
tagging behavior in Twitter. 

The rest of the paper is organized as follows: In Section 2, 
we show how an early warning based on Twitter is possible, 
we present the data collection used in our experiments and 
analysis, and the standard biosurveillance methods applied. 
In Section 3, we introduce a personalized learning to rank 
approach, based on Twitter, to support the task of analysis 
and control in the presence of a sudden outbreak. Related 
works are discussed in Section 4. Finally, in Section 5, we 
summarize our findings, point to future directions, and con- 
clude the paper. 

2 Twitter for Early Warning 

The continuous emergence of infectious diseases and their 
contributing factors impose new challenges to public health 
officials. Early reaction is necessary, but often communi- 
cation and information flow through traditional channels is 
slow. Additional sources of information, such as social me- 
dia streams, provide complements to the traditional report- 
ing mechanisms. 

For example, if we observe Figure 1 , we can see two 
plots, one of them corresponds to the relative frequency of 
EHEC cases as reported by RKI (RKI ), and the other to 
the relative frequency of mentions of the keyword "EHEC" 
in the tweets collected during the months of May and June 
201 1. We can appreciate the high correlation of the curves, 
which corresponds to a Pearson correlation coefficient of 
0.864. We can also observe the inertia of the crowd that con- 
tinued tweeting about the outbreak, even though the number 
of cases were already declining (e.g., June 5 to 11). 

Twitter has shown potential as a source of information 
for public health event monitoring (e.g., (Paul and Dredze 



Table 1: Data collected from Twitter related to the 
HUS/EHEC outbreak in Germany during May and 
June, 2011. 



Description 



Amount 



Number of tweets collected related to 7,710,231 
medical conditions during May and June, 

2011 

Tweets extracted related to the 456,226 

EHEC/HUS outbreak out of the ones 

collected 

Distinct users that produced the tweets re- 54,381 

lated to the outbreak 

201 lb; Sofean et al. 2012)), but could it be possible to gener- 
ate an early warning signal before well established systems 
by only tracking Twitter? 

In this section, we have a closer look to the time period 
of the EHEC/HUS outbreak in Germany, and address this 
question. 

2.1 Data Collection 

We incrementally collected tweets using Twitter's API, cur- 
rently we monitor over 500 diseases and symptoms, which 
include "EHEC". One of the challenges we face collecting 
data from Twitter, besides the API restrictions, is the level 
of noise with respect to medical domain content. Straightfor- 
ward techniques relying on regular expressions, even though 
they exhibit high recall, are difficult to maintain and prone to 
high false positive rates. For example, consider the follow- 
ing two tweets collected by a combination of regular expres- 
sions, and a dictionary of diseases that includes the medical 
conditions EHEC and fever. 

1. RKI warns against north German vegetables: Experts 
looking feverishly EHEC source http://bit.ly/itGpJx 

2. I've definitely Bieber-fever. There's no doubt, but who 
hasn't got bieber fever? @justinbieber is soo damn 
rawwwr 

Tweet number one is of obvious importance for epidemic 
intelligence, but number two is not. 

Instead of simple keyword matching to filter out irrelevant 
tweets, our data collection strategy includes text classifica- 
tion methods and a multi-level filtering based on supervised 
learning, following the approach of Stewart et al. (Stewart, 
Smith, and Nejdl 2011). 

Table 1 summarizes the data collected related to the out- 
break that was used in our analysis. 

2.2 Detection Methods 

The surveillance algorithms we used are well documented in 
the disease aberration literature e.g. (Khan 2007; Hutwagner 
et al. 2003; Basseville and Nikiforov 1993). The objective 
of these algorithms is to detect aberration patterns in time 
series data when the volume of an observation variable ex- 
ceeds an expected threshold value. In our case, for example, 
the observation variable corresponds to mentions of medical 
condition "EHEC" withing the tweets. 
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Figure 1: Relative frequency of cases reported to RKI and the number of tweets mentioning the name of the disease: 
EHEC. The Pearson correlation coefficient is 0.864. Monitoring Twitter allowed us to generate the first signal on Friday, 
May 20th, 2011, using standard biosurveillance methods, before well established early warning systems (triangle on the 
time axis). 



The five biosurveillance algorithms we used for early de- 
tection are: the Early Aberration Reporting System (EARS) 
(1) CI, (2) C2, and (3) C3 algorithms, (4) F-statistic, and (5) 
Exponential Weighted Moving Average (EWMA). Please 
refer to (Khan 2007) for a detailed introduction. 

We signal an alarm if the test statistic reported by the de- 
tection methods exceeds a threshold value, which is deter- 
mined experimentally. The larger the amount by which the 
threshold is exceeded, the greater the severity of the alarm. 

Table 2 summarizes the alarm dates and detection meth- 
ods parametrization, which follows the guidelines of N. Col- 
lier (Collier 2010). 

Using any of the detection methods (Table 2), a daily 
count less than five tweets was enough to signal an alert 
on May 20th, 2011. The Early Warning and Response Sys- 
tem (EWRS) ' of the European Union received a first com- 
munication by the German authorities on Sunday May 22. 
MedlSys 2 detected the first media report in the German 
newspaper Die Welt 3 on Saturday May 21 (Linge et al. 
2011) and ProMED-mail 4 and all other major early alert- 
ing systems (e.g., ARGUS, Biocaster, GPHIN, HealthMap, 
PULS) covered the event on Monday May 23. 

Why was this early detection possible with respect to well 
established early warning systems? We tracked only Twitter 
as source of information, in contrast to MedlSys for exam- 
ple, that tracks hundreds of news sources on the Internet. We 
consider Twitter's diversity was the key element that helped 



Table 2: Detection method parameters and alarm dates 



'EWRS: ewrs.ecdc.europa.eu 
2 MedISys: medusa.jrc.it/medisys 
3 Die Welt: welt.de 
4 ProMED-mail: promedmail.org 



Detection 


Parametrization (Khan 


Alarm 


Method 


2007; Collier 2010) 


Dates 


CI 


Training window = 15 days; 


May 20 to 




buffer = 5 days; upper control 


May 28 




limit = n + So- 




C2 


Training window =15 days; 


May 20 to 




buffer = 5 days; upper control 


May 28 




limit = fi + 3cr; alarm thresh- 






old=0.2 




C3 


Training window =15 days; 


May 20 to 




buffer = 5 days; upper control 


May 24 




limit = fi + 3c; alarm thresh- 






old=0.3 




F-statistic 


Training window =15 days; 


May 20 to 




buffer = 5 days; alarm thresh- 


June 30 




old=0.6 




EWMA 


Training window =15 days; 


May 20 to 




buffer = 5 days; alarm thresh- 


May 30 




old=4, u = 0.24 





in the earlier detection of the event. 

Twitter is a diverse stream of multiple sources. In Twit- 
ter converges the contribution from the crowd - millions of 
individual users obscure and renown; big and small media 
outlets; global and local newspapers, etc. Our work and that 
of MedlSys focus on an analysis at a national level, but there 
are cases where support for the local perspective is impor- 
tant, for example local and smaller news papers reaching a 
broader audience through Twitter. 



A closer look to day May 20, reveals that the first alarm 
was triggered based on five tweets, the actual messages are 
shown in Figure 2, all of them generated from sources not 
far from where the first cases of the outbreak were reported. 
Those users acted as local sensors, producing tweets that 
spread the news faster than major newspapers. 




Mehrere Hamburger mil EH l : .C- Knv^iT 

it: Hamburg (dpa/lnoj - In 
IIiiiiiIhii;- 1 , iiiihpn ■-ii'li liii-livfiT Mensclien 
raitdem... http://bit.ly/n17ZQWp 



■ 



Table 3: Learning to Rank: Summary of Notations 



Notations 


Explanations 


Q = {<7i,--- ,q\Q\} 


Set of queries 


qi&Q 


Query 


D = {di,--- ,d\ D \} 


Set of documents 


dj G D 


Document 


Y = {2/1, ■■■ ,v\y\} 


Set of relevance judgments 


va e Y 


Relevance judgment of 




query-document pair 




{qi,dj) G Q x D 


4>(<li,dj) 


Feature vector w.r.t. (qi , dj ) 


<j>k(qi,dj) 


k th dimension of (f>(qi,dj) 


T = {(qi,dj),<t>(qi,dj),yij} 


Training set 


Ki<\Q\ 




Kj<\D\ 





Mcbiviv I [iinibur^'i mil I Hil'-lini s,ri 
infiziert: Hamburg (dpa/lno) - In 

Hiinllmis lialiL'!i -it'll iimlliviv Mi'iisihon 

mit dem... http://bit.ly/IRM5Kr 



Figure 2: The 5 tweets that triggered the first signal for 
disease EHEC on May 20, 2011. 

3 Twitter for Outbreak Analysis and Control 

For public health officials, who are participating in the in- 
vestigation of an outbreak, the millions of documents pro- 
duced over social media streams represent an overwhelming 
amount of information for risk assessment. 

To reduce this overload we explore to what extent rec- 
ommender systems techniques can help to filter informa- 
tion items according to the public health users' context and 
preferences (e.g., disease, symptoms, location). In particu- 
lar, we focus on a personalized learning to rank approach 
that ultimately offers the user the most relevant and attrac- 
tive tweets for risk assessment. In this section, we introduce 
our approach and report an experimental evaluation on the 
EHEC/HUS dataset collected from Twitter. 

3.1 Background: Learning to Rank for IR 

Learning to rank for Information Retrieval (L2R) is an ac- 
tive area of recent research (Qin et al. 2010). L2R is set 
as a supervised learning task that considers fundamentally 
two phases: learning and retrieval. In learning (training), a 
collection of queries and their corresponding retrieved doc- 
uments are given. Furthermore, the labels (i.e., relevance 
judgments) of the document with respect to the queries are 
also available. The relevance judgments, provided by human 
annotators, can represent ranks (e.g., categories in a total or- 
der) or binary labels (e.g., relevant or not-relevant). The ob- 
jective of learning is to construct a ranking model w, e.g., 
a ranking function, that achieves the best result on test data 
in the sense of optimization of a performance measure (e.g., 
error rate, degree of agreement between the two rankings, 
classification accuracy or mean average precision). 

In retrieval (test phase), given a query-document pair, 
the learned ranking function is applied, returning a ranked 
list of documents in descending order of their relevance 



scores. More formally, suppose that Q = {qi,- ■ ■ ,q\Q\} 
is the set of queries, and D = {di, ■ ■ ■ , d|_o|} the set of 
documents, the training set is created as a set of query- 
document pairs, (qi,dj) G Q x D, upon which a rele- 
vance judgment (e.g., a label) indicating the relationship be- 
tween qi and dj is assigned by an annotator. Suppose that 
Y — {yi, ■ ■ ■ ,y\Y\} is me set of labels and j/y G Y de- 
notes the label of query-document pair (qi,dj). A feature 
vector 4>(qi,dj) is created from each query-document pair 



(Qi,d 



])■>' 



1,2, 



\Q\;j 



1,2,-- 



\D\. The training 



set is denoted as T — {(qi, dj), <j){qi, dj),yij}. The ranking 
model is a real valued function of features: 



f(q,d) = w-cp(q,d) 



(1) 



where w denotes a weight vector. In ranking, for query qi the 
model associates a score to each of the documents dj as their 
degree of relevance with respect to query q^ using /(#,-, dj), 
and sort the documents based on their scores. 

Table 3 gives a summary of notations described above. 

Pairwise approaches, such as Ranking SVM (Joachims 
2002) or Stochastic Pairwise Descent (Sculley 2009), have 
proved successful in addressing the L2R task. A compre- 
hensive study on different learning to rank techniques can 
be found in (Liu 2009). 

Although much work has been carried out on L2R tech- 
niques for systems that return documents in response to a 
query, little effort has been devoted to exploiting L2R in a 
personalized setting, specially in the domain of epidemic in- 
telligence. 

3.2 Our Approach: 

Ranking Tweets for Epidemic Intelligence 

We propose to use the user context as implicit criteria to se- 
lect tweets of potential relevance, that is, we will rank and 
derive a short list of tweets based on the user context. The 
user context C u is defined as a triple 



k-u — (£>-'"L'. U j-L u ) , 



(2) 



where t is a discrete time interval, MC U the set of medical 
conditions, and L u the set of locations of user interest. 

We define three concepts that will help us to discuss our 
approach in rest of the section: 



Algorithm 1 Personalized Tweet Ranking algorithm for 
Epidemic Intelligence (PTR4EI) 

Input: User Context C u = (t, MC U , L u ), 

Inverted index T of tweets collected for epidemic intel- 
ligence before time t 

Output: Ranking Function fc u for User Context C u 

1: Compute LDA topics (topics LD A) on T 

2: Consider each mc g MC U as a hash-tag, and extract 
from T all co-occurring hash-tags: coHashTags 

3: Classify the terms in topicsLDA and the hash-tags in 
coHashTags as Medical Condition MC X , Location L x 
or Complementary Context CC X 

4: Build a set of queries as follows: 

Q = {q | q e MC U x V({L U U MC X UL X U CC X })} 

5: For each query qi E Q obtain tweets D from the collec- 
tion T 

6: Elicit relevance judgments Y on a subset D y C D 

7: For each tweet dj E D, obtain the feature vector 
(j>(qi, dj) w.r.t. (qi,dj) E Q x D 

8: Apply learning to rank to obtain a ranking function for 
the user context C u : fc u (q, d) = w ■ (f>(q, d) 

9: return fc M (q,d) 



Medical Condition is a string that describes a human 
medical condition, such as a disease, disorder or syndrome. 
We represent the set of medical conditions as MC. 

Location is a string that is used to identify a point or an 
area on the Earth's surface, which can be mapped to a spe- 
cific pairing of latitude and longitude. The set of locations is 
denoted as L. 

Complementary Context is defined as the set of nouns, 
which are neither Locations nor Medical Conditions. Com- 
plementary Context may include named entities such as 
names of persons, organizations, affected organisms, expres- 
sions of time, quantities, etc. We denote the set of named 
entities that represents the complementary context as CC, 
where CCn(LU MC) = 0. 

Out Personalized Tweet Ranking for Epidemic Intelli- 
gence algorithm or PTR4EI is shown in Algorithm 1. The 
algorithm extends a learning to rank framework (Section 3.1 
by considering a personalized setting that exploits user's in- 
dividual context. 

More precisely, we consider the context of the user, C u , 
and prepare a set of queries, Q, for a target event (e.g., a dis- 
ease outbreak). We first compute LDA (Blei, Ng, and Jordan 
2003) on an indexed collection T of tweets for epidemic in- 
telligence, where not all tweets are necessarily interesting 
for the target event. 

We also extract the hash-tags that co-occur with the user 
context by considering the medical conditions and locations 
in C u as hash-tags themselves, and find which other hash- 
tags co-occur with them within a tweet, and how often they 
co-occur, which will help us to select the most representative 



Table 4: Four LDA topics (columns) computed weekly 
during the main period of the outbreak: from May 23 
to June 19, 2011. We classify terms within each topic as 
Medical Condition (MC), Location (L), or Complementary 
Context (CC). 



Week 21 


EHEC (MC) 


fever (MC) 


EHEC (MC) 


EHEC (MC) 


cucumbers (CC) 


pain (MC) 


casualty (-) 


pathogen (MC) 


Spain (L) 


headache (MC) 


women (CC) 


Northern Germany (L) 


tomatoes (CC) 


sniff (MC) 


intestinal germ (MC) 


diarrhea (MC) 


salad (CC) 


pain (MC) 


panic (MC) 


dead (MC) 


Week 22 


EHEC (MC) 


EHEC (MC) 


EHEC (MC) 


EHEC (MC) 


dead (MC) 


intestinal germ (MC) 


cucumbers (CC) 


cucumbers (CC) 


Germany (L) 


source (-) 


pathogen (MC) 


salad (CC) 


people (-) 


search (-) 


Spain (L) 


pain (MC) 


live (-) 


Hamburg (L) 


farmers (CC) 


women (CC) 


Week 23 


EHEC (MC) 


headache (MC) 


EHEC (MC) 


EHEC (MC) 


cucumber (CC) 


pain (MC) 


cucumbers (CC) 


sprout (CC) 


eu (CC) 


fever (MC) 


sprout (CC) 


source (-) 


crisis management (-) 


people (-) 


pathogen (MC) 


suspicion (-) 


farmers (CC) 


cough (MC) 


salad (CC) 


hus (MC) 


Week 24 


EHEC (MC) 


headache (MC) 


stomach ache (MC) 


pain (MC) 


germ (MC) 


fever (MC) 


sniff (MC) 


bellyache (MC) 


sprout (CC) 


slept (-) 


pain (MC) 


cough (MC) 


health (MC) 


sniff (MC) 


regions (-) 


throat (CC) 


all-clear (CC) 


head (CC) 


examined (-) 


sniff (MC) 



hash-tags for the target event. 

The set Q is constructed by expanding the original terms 
in C u with the ones in the LDA topics and co-occurring 
hash-tags, which are previously classified as medical con- 
dition, location or complementary context. 

We build the set D of tweets by querying index T using 
q E Q as query terms. Next, we elicit judgments from ex- 
perts on a subset of the tweets retrieved, in order to construct 
D y CD. 

We then obtain for each tweet dj E D its features vector 
4>{qt, dj) with respect to the pair (qi, dj) E Q x D. 

Finally and with these elements, we apply a learning to 
rank algorithm to obtain the ranking function for the given 
user context. 

In the rest of the section, we evaluate our approach con- 
sidering as event of interest the EHEC/HUS outbreak in Ger- 
many, 2011. 

Experiments and Evaluation To support users in the 
assessment and analysis during the EHEC/HUS outbreak, 
we set the user context (Eq. 2) as C u — (t, MC U , L u ) = 
([2011-05-23; 2011-06-19], {"EHEC"}, {"Lower Saxony"}), 
in this way, we are taking into account the main period of 
the outbreak 5 , the disease of interest, and the German state 
with more cases reported. 

Following Algorithm 1 , we computed LDA and extracted 
the co-occurring hash-tags using the indexed collection T 
described in Section 2.1. Table 4 shows four LDA topics for 
each week of the time period of interest, and Table 5 presents 
the hash-tags co-occurring with #EHEC. 

We asked three experts: one from the Robert Koch Insti- 
tute and the other two from the Lower Saxony State Health 



Please note, that even though the main period of the outbreak 
is considered for the evaluation, nothing prevents us to build the 
model during the ongoing outbreak, and recompute it periodically 
(e.g., weekly). 



Table 5: Hash-tags co-occurring with #EHEC during 
May 23 and June 19, 2011, the main period of the 
outbreak. The hash-tags are classified as entities of type 
Medical Condition, Location, or Complementary Context, 
hash-tags out of these categories are discarded. 



Medical Condition 


Location 


Complementary Context 


Week 21 


bacteria 


bremen 


cucumber_salad 


cdu 


diarrhea 


cuxhaven 


cucumbers 


edeka 


ehec_victim 


hamburg 


ehec -vegetable 


fdp 


hus 


miinster 


tomatoes 


merkel 


intestinal Jnfection 


northern_germany 


vegetables 


rki 


Week 22 


bacteria 


berlin 


cucumbers 


bild 


diarrhea 


germany 


obst 


fdp 


ehec_pafhogen 


hamburg 


salad 


n24 


hus 


liibeck 


terror 


rki 


intestinal jnfection 


spain 


tomatoes 


rtl 


Week 23 


bacteria 


bavaria 


cucumbers 


ehec jreei 


diarrhea 


berlin 


salad 


fdp 


ehec_pafhogen 


germany 


sojasprout 


merkel 


hus 


hamburg 


sprout 


n24 


intestinal jnfection 


lower _saxony 




rki 


Week 24 


bacteria 


lower .saxony 


donate.blood 




died 




ehecJree 




health 




sojasprout 




hus 









Department (NLGA) 6 to provide their individual judgment 
on a subset D y of 240 tweets, evaluating for each tweet, if it 
was relevant or not to support their analysis of the outbreak. 
Any disagreement in the assigned relevance scores were re- 
solved by majority voting. 

We selected these tweets from the index T as follows: 30 
were obtained using as query the term "EHEC", i.e., MC U , 
together with the medical conditions identified using LDA, 
and 30 using the medical conditions from the hash-tags. We 
used a similar procedure combining query "EHEC" with the 
locations and complementary context extracted from LDA 
and hash-tag co-occurrence, obtaining 30 tweets at every 
step, for a total of 120 tweets. For the rest 60, we used the 
query term "EHEC" alone, then we ordered the result set 
chronologically based on the tweets' publication date, and 
selected the most recent ones. 

We prepared five binary features for each tweet as fol- 
lows: 

Feature Value = True 

Fmc If a medical condition is present in the 

tweet 
Fl If a location is present in the tweet 

-P#-tag If a hash-tag is present in the tweet 
Fee If a complementary context term is 

present in the tweet 
Furl If a URL is present in the tweet 

For learning the ranking function, we used Stochas- 
tic Pairwise Descent (SPD) algorithm (Sculley 2009), 
which solves the same optimization problem as Ranking 



SVM (Joachims 2002), but using stochastic gradient de- 
scent, whose characteristics make it more appealing to scale 
to larger datasets (e.g., (Bottou 2010)). 

We compared our approach, that expand the user context 
with latent topics and social generated hash-tags, against two 
ranking methods: 

• RankMC: It learns a ranking function using only medi- 
cal conditions as feature, i.e., Fmc- Please note, that this 
baseline also considers related medical conditions to the 
ones in MC U , which makes it stronger than non-learning 
approaches, such as BM25 or TF-IDF scores, that use 
only the MC U elements as query terms. 

• RankMCL: It is similar to RankMC, but besides the med- 
ical conditions, it uses a local context to perform the rank- 
ing (i.e., features: Fmc an d Fl)- We expect this method 
to perform better than RankMC, since it does not only 
take into account the spatial information from the user 
context, but also additional locations in the collection. 

We conducted 10-fold cross validation experiments. For 
each fold, we used 80% of the tweets for training and the 
remaining 20% for testing. The test set is used to evaluate the 
ranking methods. The reported performance is the average 
over the ten folds. 

Evaluation Measures For evaluation, we used three eval- 
uation measures widely used in information retrieval, 
namely precision at position n (P@ri), mean average pre- 
cision (MAP), and normalized discount cumulative gain 
(NDCG). Their definitions are as follows. 

Precision at Position n (P@n) (Baeza-Yates and 
Ribeiro-Neto 201 1) measures the relevance of the top n doc- 
uments in the ranking list with respect to a given query: 



P@n 



# of relevant docs in top n results 



(3) 



Mean Average Precision (MAP) The average precision 
(AP) (Baeza-Yates and Ribeiro-Neto 201 1) of a given query 
is calculated as Eq. (4), and corresponds to the average of 
P@n values for all relevant documents: 



,JV 



AP 



En=l ( P@n * rd i n )) 

# of relevant docs for this query 



(4) 



where N is the number of retrieved documents, and 
relin) is a binary function that evaluates to 1 if the n th doc- 
ument is relevant, and otherwise. Finally, MAP (Baeza- 
Yates and Ribeiro-Neto 2011) is obtained averaging the AP 
values over the set of queries. 

Normalized Discount Cumulative Gain (NDCG) For 
a single query, the NDCG (Jarvelin and Kekalainen 2002) 
value of its ranking list at position n is computed by Eq. (5): 



NDCG@n 



E 



2 r 0') — l 

log(l+j) 



(5) 
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where r(j) is the rating of the the j-th document in the 
ranking list, and the normalization constant Z n is chosen so 
that the perfect list gets NDCG score of 1 . 
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Ranking Performance : MAP and NDCG@{1, 3, 5, 10} 

I I I I II 

MAP NDCG@1 NDCG@3 NDCG@S NDCG@10 

D RankMC (baseline) □ RankMCL (baseline) ■ PTR4EI 

Figure 3: MAP and NDCG Results 

Table 6: Ranking Performance in terms of P@{1, 3, 5, 10} 

Method P@l P@3 P@5 P@10 

RankMC (baseline) 90% 73.34% 64% 69% 

RankMCL (baseline) 90 % 83.33 % 88 % 85 % 

PTR4EI 100 % 90 % 94 % 96 % 



For the training dataset, we define two ratings {1,0} cor- 
responding to "relevant to the outbreak" and "not-relevant 
to the outbreak" in order to compute NDCG scores. 

Results The ranking performance in terms of precision 
is presented in Table 6, MAP and NDCG results are 
shown in Figure 3. As we can appreciate PTR4EI outper- 
forms both baselines. Local information helps RankMCL 
to beat RankMC, for example MAP improves from 71.96% 
(RankMC) up to 81.82% (RankMCL). PTR4EI, besides lo- 
cal features, exploits complementary context information 
and particular Twitter features, such as the presence of hash- 
tags or URLs in the tweets, this information allows it to 
improve its ranking performance even further, reaching a 
MAP of 91.80%. A similar behavior is observed for preci- 
sion and NDCG, where PTR4EI is statistically significantly 
better than RankMC and RankMCL. 

4 Related Work 

In order to detect public health events, supervised (Stew- 
art, Smith, and Nejdl 2011), unsupervised (Fisichella et al. 
2011) and rule-based approaches have been used to extract 
public health events from social media and news. For ex- 
ample, PULS (Steinberger et al. 2008) identify the disease, 
time, location and cases of a news-reported event. It is in- 
tegrated into MedlSys, which automatically collects news 
articles concerning public health in various languages, and 
aggregates the extracted facts according to pre-defined cate- 
gories, in a multi-lingual manner. 

Other systems have sought to use the web and social me- 
dia as a predictor to monitor and gauge the seasonal patterns 
of influenza. These systems correlate the queries used in 
search behavior with the infection rates of influenza-like ill- 
nesses statistics (Polgreen et al. 2008; Ginsberg et al. 2009). 

Monitoring analysis has also been carried out on Twitter. 
The work of Chew et al. focused on the use of the terms 
"H1N1" and "swine flu" during the H1N1 2009 outbreak 



(Chew and Eysenbach 2009). They showed that the con- 
cise and timely nature of tweets can provide health officials 
with the a means to become aware, and respond to concerns 
raised by the public. 

Culotta applied text classification to filter out tweets that 
are not reporting about influenza-like illnesses. Further, they 
modeled influenza rates by regression models and compared 
to U.S. Center of Disease Control statistics (Culotta 2010). 

Lampos and Cristianini also presented a monitoring tool 
for social media that is based on the textual analysis of 
micro-blog content (Lampos and Cristianini 2010), (Lam- 
pos, Bie, and Cristianini 2010). Their study focused on 
influenza-like illnesses in the UK and showed a correlation 
with data from the Health Protection Agency. Another study 
of Twitter content concentrated on influenza-like illnesses 
in the U.S. (Signorini, Segre, and Polgreen 2011). Paul and 
Dredze (Paul and Dredze 2011a; 2011b) introduced a new 
aspect topic model for Twitter that associates symptoms, 
treatments and general words with diseases. Their focus is 
on general public health, not necessarily infectious diseases 
or disease outbreaks. 

In contrast to these systems, we seek to not only detect 
and monitor potential public health threats, but also provide 
support for public health officials to asses the potential risk 
associated with the volume of information that is available 
within Twitter streams. Moreover, our proposed approach 
shows the potential of using Twitter for monitoring non- 
seasonal outbreaks in and geo-spacially sparse tweet loca- 
tions. 

Our work is similar to that of (Linge et al. 2011), were 
media reports on the 2011 EHEC outbreak in Germany are 
tracked. Although in their work no early warning was possi- 
ble, they identified key aspects of developing outbreak sto- 
ries. In contrast to this work, our approach exploits social 
media data and we show that a system can help to get early 
warnings on public health threats. 

Although some works exist that address the task of rank- 
ing tweets, little effort has been devoted to explore person- 
alized ranking of tweets in the domain of epidemic intel- 
ligence. For example Duan et al. rank individual generic 
tweets according to their relevance to a given query (Duan 
et al. 2010). The features used include content relevance 
features, Twitter specific features and account authority fea- 
tures. In contrast, our is a personalized learning to rank ap- 
proach for epidemic intelligence, that exploits an expanded 
user context by means of latent topics and on social hash- 
tagging behavior. 

5 Conclusion and Future Directions 

To show the potential of Twitter for early warning, we fo- 
cused on the recent EHEC/HUS outbreak in Germany, and 
monitor the social stream. We applied several biosurveil- 
lance methods on a set of tweets collected in real time dur- 
ing the time of the event using Twitter API. All the detection 
methods triggered an alarm on May 20, a day ahead of well 
established early warning systems, such as MedlSys. 

After the detection of the outbreak, authorities investi- 
gating the cause and the impact in the population were in- 
terested in the analysis of micro-blog data related to the 



event. Thousands of tweets were produced every day, which 
made this task overwhelming for the experts. We proposed 
in this work a Personalized Tweet Ranking algorithm for 
Epidemic Intelligence (PTR4EI) that provides users a per- 
sonalized short list of tweets that meets the context of their 
investigation. PTR4EI exploits features that go beyond the 
medical condition and location (i.e., user context), but in- 
cludes complementary context information, extracted using 
LDA and the social hash-tagging behavior in Twitter, plus 
additional Twitter specific features. Our experimental evalu- 
ation showed the superior ranking performance of PTR4EI. 

We are currently working closely with German and global 
public health institutions to help them integrate the monitor- 
ing of social media to their existing surveillance systems. 

As future work, we plan to scale up our experiments, and 
to apply techniques of online ranking in order to update the 
model more efficiently as the outbreak develops. 

We have shown the potential of Twitter to trigger early 
warnings in the case of sudden outbreaks and how personal- 
ized ranking for epidemic intelligence can be achieved. We 
believe our work can serve as a building block for an open 
early warning system based on Twitter, and hope that this 
paper provides some insights into the future of epidemic in- 
telligence based on social media streams. 
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